Analysis of PINl WW domain through a simple Statistical Mechanics Model 
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We have applied a simple statistical-mechanics Go-like model to the analysis of the PINl WW 
domain, resorting to Mean Field and Monte Carlo techniques to characterize its thermodynamics, 
and comparing the results with the wealth of available experimental data. PINl WW domain is a 
39-residues protein fragment which folds on an antiparallel /3-sheet, thus representing an interesting 
model system to study the behavior of these secondary structure elements. Results show that 
the model correctly reproduces the two-state behavior of the protein, and also the trends of the 
experimental 0T-values. Moreover, there is a good agreement between Monte Carlo results and the 
Mean-Field ones, which can be obtained with a substantially smaller computational effort. 
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I. INTRODUCTION 

Understanding the folding process of proteins is one 
of the most challenging issues of biochemistry which re- 
quires sophisticated simulations at atomic resolution gen- 
erally referred as all-atom methods. At present the large 
incompatibility between folding time scales and regimes 
explored by all-atom simulations makes the folding pro- 
cess not yet accessible to these powerful computational 
approaches. Even though very encouraging progress have 
been achieved, their applicability remains restricted to 
the study of peptides and fragments of proteins P, 0. 
In addition, the comparison to experiments requires an 
accumulation of folding events to gain a enough large 
statistics further narrowing the route to the to full-atom 
techniques. These limitations suggest resorting to mini- 
malist models which adopt a less accurate description of 
protein chains, residue-residue and residue-solvent inter- 
actions 0, 0, IE 0; ■ Approximate representations re- 
duce the computational costs and, with a certain amount 
of uncertainty, allow to follow all the stages which bring 
a protein into its native fold. The use of simplified mod- 
els within a statistical mechanical approach to protein 
folding is grounded on the assumption that not all the 
chemical details need to be retained to understand and 
describe the basic properties of folding processes. Of 
course the approximations, that this kind of approach 
introduces, must ensure that the basic principles of bio- 
chemistry are fulfilled to keep a correct description of 
the real molecules. Several years ago a simple model 
was proposed by N. Go :8j to attain a phenomenologi- 
cal but complete description of the folding reaction. The 
model replaces all non-bonded interactions by attractive 
native-state contact energies. This recipe, which can be 
applied only when native structure is known, implements 
the idea that a reasonable energy bias toward the native 
state could capture the relevant features of the folding 
process. This kind of modelling removes high energetic 



barriers along the pathways toward the native conforma- 
tion (which lies in a deep minimum), and produces rela- 
tively smooth energy landscapes. As a result the folding 
"funnel" leading to the native state is very smooth 

so the folding process results " ideal" . Folding events sim- 
ulated through Go-like potentials take only few nanosec- 
onds making possible to obtain statistically meaningful 
results for generic proteins and polypeptide chains. Since 
Go-like models lack any energetic frustration, the scope 
of their applications is related to the investigation of the 
role of geometric frustration and configurational entropy 
in the folding process. Their success in providing a rea- 
sonable account for kinetic properties of the folding pro- 
cess is related to the assumption that folding kinetics is 
mainly determined by native geometry, together with na- 
tive state stability, and this view is indeed supported by 
several experimental works 0, 0, 0, [3, Along 
the lines indicated by the Go-philosophy other simplified 
models exploiting the information present in the native 
state have been proposed EH 0, 0, 0| ■ In this paper 
we continue our analysis 12 (M of one of this Go-like mod- 
els, the Finkelstein model [lol l2l| . and apply it to the 
study of the Pinl WW domain (pdb code 1I6C) which 
has a well defined and simple native structure made of 
two slightly bent antiparallel beta-sheets. Its distinctive 
feature, which is also reflected in its name, is the presence 
of two Triptophanes (W) , located 20-residues apart from 
one another. Its structure, with a simple topology, lacks 
of all those features that can complicate the modeling. 
Thus this molecule represents a suitable candidate to ex- 
plore the kinetic and thermodynamic factors responsible 
for the formation of /3-sheets and their stability, and is 
also a suitable benchmark through which validate models 
and theories. The Finkelstein model is particularly suit- 
able for analyzing the folding thermodynamics of two- 
state proteins and the WW domain is known to fold in a 
two state scenario so we can test whether the model can 
faithfully reproduce the known experimental data 
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about WW domain folding. 

The organization of the paper is as follow. In section 

II we discuss the model and its assumptions. In section 

III we present the Monte Carlo and Mean Field methods 
we adopt, and in section IV we report and discuss our 
results. Finally, section V is dedicated to the concluding 
remarks. 



II. DESCRIPTION OF FINKELSTEIN MODEL 

Finkelstein model assumes a simple description of the 
polypeptide chain, where residues can stay only in an 
ordered (native) or disordered (non-native) state. Then, 
each micro-state of a protein with L residues is encoded in 
a sequence of L binary variables s — {si, S2, sl}, Si — 
{0, 1}. Residues with st — \ {si — 0) are in their native 
(non-native) conformation. When all variables take on 
the value 1, the protein is considered folded, whereas the 
random coil corresponds to all O's. Because each residue 
can be in one of the two states, ordered or disordered, 
the free energy landscape consists of 2^ configurations. 
This enormous reduction in the number of configurations 
available to a protein is a quite delicate point because it 
is a restrictive feature of the model. However this crude 
assumption, already employed in p^ . is the simplest one 
leading to a two state behaviour of the folding. 

The effective Hamiltonian (indeed, a free-energy func- 
tion) is 



F(s) 



TS{s) 



where ^(s) is given by: 



5(s) = R 



^^^(l - Si) + Sloop{s) 



(1) 



(2) 



R is the gas constant and T the absolute temperature. 
The first term in Eq. (^) is the energy associated to native 
contact formation. Non native interactions are neglected: 
this further assumption can be just tested a posteriori 
and it is expected to hold if, during the folding process, 
the progress along the reaction coordinate is well depicted 
on the basis of the native contacts. That is, the reaction 
coordinate(s) must be related to just the native contacts. 
Moreover, such progress must be slow with respect to all 
other motions, so that all non-native interaction can be 
"averaged-out" when considering the folding pathways. 
Ay denotes the element i,j of the contact matrix, whose 
entries are the number of heavy-atom contacts between 
residues i and j in the native state. Here we consider 
two amino-acids to be in contact if there are at least 
two heavy atoms (one from aminoacids i and one from 
j) separated by a distance less than 5 A. The matrix A 
embodies the geometrical properties of the protein. 

The second term in Eq. is the conformational en- 
tropy associated to the presence of unfolded regions along 
the chain, and vanishes in the native state. 



More precisely the first term in Eq. jSJ is a sort of 
"internal" entropy of the residues: qR represents the en- 
tropic difference between the coil and the native state 
of a single residue. This can be noticed by considering 
that in the fully unfolded state Sioop vanishes and the 
remaining entropy is qLR only. 

The term RSioop in Eq. |5J| is the entropy pertaining to 
the disordered closed loops protruding from the globular 
native state fl3; it reads: 



Sioop{s) ^'^J{rij) J]^ {l-Sk)siSj 



(3) 



k=i+l 



According to we take: 
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J(r..) = -2lnN-j|-^-^^|^ 



(4) 



In this way the configuration of a disordered loop going 
from residues (i + 1) to (j — 1), with i and j in their 
native positions, is assimilated to a random walk with 
end to end distance , the latter being the distance be- 
tween Ca atoms of residues i and j in the native state. 
The parameters d — 3.8 A and A = 20 A are the 
average distance of consecutive Ca along the chain and 
persistence length respectively. The entropy of one loop 
closure (@J) differs from the classical result — 3i?/21n(A^) 
pertaining to a free Gaussian chains 24]. The presence 
of the factor 5/2, instead of 3/2, stems from the fact that 
a loop exiting the globule must lie completely outside of 
it, to account for the self- avoidance. Thus, the spatial 
domain occupied by the globule results in a forbidden 
region for the disordered loop, and this simple sterical 
constraint, reducing the number of accessible conforma- 
tions, increases the entropy loss obtained from the closure 
of the loop fl^. 



III. METHODS 

A direct comparison between model predictions and 
experimental results requires a tuning of the coefficients 
q and e in the energy function Eq. (^). In our compu- 
tation we set q = 2.31 and regarded e as an adjustable 
parameter. We determined it by imposing that the mean- 
field specific heat exhibits its "collapse" peak in corre- 
spondence to the experimental transition temperature 
T = 332 K |2i|. Despite the use of a simple MF ap- 
proach, we expect that this procedure yields a correct es- 
timate for e, since the MF is known to reproduce the ther- 
modynamics properties of the Finkelstein model pretty 
faithfully "2^. Once determined the optimal choice of q 
and e, wc performed Monte Carlo simulations to inves- 
tigate the thermal folding of the WW domain. We im- 
plemented a Metropolis algorithm with transition rates 
between states j and k 

w{j ^ fc) = exp[(i/j - Hk)/RT] 
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R being the gas constant, T the temperature and Hj the 
Finkelstein energy of state j, according to Eq. JQl. 

We apphed the multiple histogram technique (MHT) 
|25j to reconstruct the system density of states (DOS) 
in the full range of accessible energies. To this end, 
we carried out MC runs at 50 equally spaced temper- 
atures in the range 273 — 383 K, and for each run 
we collected the energy histogram to estimate the sta- 
tistical weight of all configurations with a certain en- 
ergy. Through the Swendsen-Ferremberg procedure psl ] 
these histograms were optimally linearly combined to 
extract the whole DOS and thus compute the entropy 
S{E) = R\n[g{E)] up to an additive constant. The 
knowledge of entropy allows evaluating the free energy 
profiles F{E) — E ~ TS{E), and other relevant thermo- 
dynamical quantities for the folding, such as the specific 
heat. 

In its variational formulation , Mean Field Approx- 
imation, for a system with Hamiltonian H and corre- 
sponding free-energy F, amounts to minimizing 



Fvar < Eq + (H — Ho)q , 



(5) 



where Hq is a solvable trial Hamiltonian Fq is the corre- 
sponding free-energy, both depending on free parameters 
X = {xi ■ ■ ■ Xl} (variational parameters). Minimization 
leads to the self consistent equations that in their general 
form read 



dxi 



{H - Ho)o -Uh-Ho 



' dxi 



0. 







(6) 



with I = 1, . . . , L. We have implemented different ver- 
sions of the MFA for the model that differ each from the 
other by the choice of the trial Hamiltonian. 

The standard MFA employees as the trial Hamiltonian: 



iJo = ^ XiSi , 
i=l 



(7) 



with Xi to be determined by minimizing the variational 
free-energy [2^ 



Instead of working with external fields XiS, it is more in- 
tuitive to use the corresponding "magnetizations" m^'s, 
writing Fyar as a function of the m^'s. Due to the choice 
of i/o, Eq. {Tj), and to the expression Eq. H1U|) . evaluat- 
ing the thermal average (i?)o amounts to replacing, in 
the Hamiltonian Eq. each variable Si by its thermal 
average TOj. In the end we get: 

Aijiniirij — TS{m) 

ij 

L 

+i?T^g(mO, (11) 

where g{u) — uhi(u) -|- (1 — u)ln(l — u) and S'(m) is 
obtained from Eq. ^ by substituting Si — *■ rrii. The 
last term corresponds to Fq — {Hq)q in Eq. Q: it is the 
entropy associated to the system with Hamiltonian Hq 
and is the typical term that stems from this kind of MFA 
[26l | . The minimization of function Eq. (|11() with respect 
to m leads to self-consistent equations: 



g'{mi) 



RT[q 



druj 



(12) 



Equations (|12|l can be solved numerically by iteration 
and provide the optimal values of the magnetizations 
that we denote by m*. Once the set of solutions m* 
is available, we can compute the variational free-energy 
Ft,ar(m*) that represents the better estimate of the sys- 
tem free-energy F. Free energy profiles are evaluated 
performing the minimization after the introduction of 
Lagrange multipliers, corresponding to the constraint of 
considering states with a fixed number of native residues. 

A different MFA consists in taking a trial Hamiltonian 
that accounts exactly for the entropic term of the original 
one, resorting to the procedure introduced in j23|, and 
approximates the interactions by introducing a weight 
dependent on the number of native residues in the con- 
figuration. Namely, we consider the set of configurations 
of the proteins with M native residues (M = 0, L) and 
take as the trial Hamiltonian 



(13) 



M=0 



Fuar{x, T) /o(^*' T) + {H - Ho)o , (8) ^Yieie ,5(«) is the Kronecker delta, and iJ^*^^ is the Hamil- 



tonian restricted to the configurations with M natives: 



where J2i fo{xi,T) is the free energy associated to Hq, 



H, 



(M) 



/o (xi , T) = - - In <j 1 -I- exp(-/?a;i 



(x) = E 



, M - 1 



(9) 



1 



(14) 



Thermal averages, performed through the Hamiltonian 
iJo, factorize {siSj...Sk)o = (si)o(sj)o---(sfc)o- The ap- 
proximate average site "magnetization" to.; = (si)o de- 
pends only on the field Xi, and is given by 



1 



rrii 



dxi 1 + exp(/3xi) 



(10) 



with ii ~ (1/2) Each residue i, in a 

generic configuration with AI native residues, feels an 
interaction ii which it would feel in the native state, 
weakened by a factor {M — 1)/{L — 1) (accounting for 
the fact that not all the residues are native), times the 
external field x^, to be fixed by the mean field procedure. 

The mean-field equations for this case can be found in 
Ref. Iia. 
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FIG. 1: Specific heat in Kcal mol^^ T^^ (inset) and energy (in 
Kcal mol~^) as function of temperature, computed through 
MC simulations (points) and standard Mean Field Approach 
(line). 




T(K) 

FIG. 2: Fraction of native protein as a function of tempera- 
ture: MC simulation, standard Mean Field, Mean Field 3 of 
Ref. '20] compared with the experimental fit in Ref. 



IV. RESULTS AND DISCUSSION 

The folding transition is signalled by the behavior of 
the specific heat, which develops a peak identifying the 
Tf. Standard MF peak position is imposed to the correct 
experimental folding temperature to fit the parameters; 
notice though that MC peak is correctly found at the 
same position, providing a consistency check between the 
two methods (Fig. 

Pinl WW domain is reported to be a two-state 
folder |22|: this is recovered by both the MC and the 
MF approximations, as can be seen in Fig. [5] MC and 
the more complicated MF approach reproduce with rea- 
sonable accuracy the experimental signal. 

The two-state nature of the protein can also be seen in 
the free-energy profiles Figs. l3l4l It is remarkable that the 
barrier separating folded from unfolded conformations is 
quite flat, especially in the MC case, so that mutations 
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FIG. 3: MC free energy profiles, with the energy as the co- 
ordinate of reaction, at different temperatures: from top to 
bottom T=292, 312, 332, 352, 372 K. 
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FIG. 4: Standard MF free energy profiles, in the number of 
native residues, at different temperatures: from top to bottom 
r=292, 312, 332, 352, 372 K. 

could likely induce relevant changes in its position with 
just a slight change in the energies, a scenario which is 
indeed suggested in Ref. [2^ . 

Monte Carlo and Mean Field free energy profiles allow 
to estimate the stability gap AG and the folding barrier 
AG^ as a function of temperature. The comparison with 
the corresponding experimental curves (Ref. 

AGeAT) - AGo + AGi(T - Tf) + AG2{T - Tff 
AGL(T) = AGl + AG\{T - Tf) + AGJ (T - Tf)^ 

where Tf = 332 K, AGo,i,2 = {-0.062,0.105,6.244- 
10--*} Kcal/mol and AGJ ^ 2 ={5.089,0.0568,1.232- 
10^3} Kcal/mol. The result of this comparison is re- 
ported in Fig. |31 

Notice that all methods compare most favorably with 
the experimental results in the vicinity of Tj, which is to 
be expected, since the model only accounts for the ge- 
ometry, and not for the details of the interactions, with 
their temperature dependence in the hydrophobic contri- 
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FIG. 5: Folding barrier (top set of curves) and stability of the 
native state (bottom set) as a function of T, from experiment 
and simulations. Data are reported in Kcal/mol. 



butions. MC gives a good estimate of both the stability 
gap and the barrier, while standard mean field gives a 
reasonable description of the folding barrier, but over- 
estimates the stability. On the other hand, the more 
complicated MF scheme recovers correctly the stability, 
but it overestimates the barrier, at least if we consider, 
as we did in Ref. 20], just the profile of Fq (relying 
on the good approximation that Fq provides to Fyar), 
without resorting to the more correct, but computation- 
ally expensive minimization of a constrained Fyar- A 
more accurate analysis of free energy profiles within this 
MF scheme is left for future work. In the following, we 
analize standard MF and MC results concerning another 
important experimental quantity, namely the 0T-values 
(Fig. EJ. ^T-values are defined as 



'dAG 
dT 



'as 



(15) 



and give an idea of the entropy of the barrier compared 
to that of the native state, providing a measure of the 
proximity of the barrier to the folded state. The exper- 
imental results show a monotonically increasing, contin- 
uous function, spanning a wide range of values. MC and 
MF results indeed agree in the monotonically increasing 
behavior, reflecting thus the Hammond behavior p5l2^ . 
even if in a discretized version. Indeed they show a series 
of discrete jumps that, in the case of MC simulations, 
are not simply an effect of the binning in the reaction 
coordinate, but seem to suggest sharp movements in the 
barrier position: sudden changes in 0t are in complete 
correspondence to shifts in the position of the barrier, as 
reported in Fig.|Sl 



V. CONCLUSIONS 

The application of the Finkelstein model to protein 
PINl WW domain reveals that this model, after fitting 
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FIG. 6: (^T-values from experiments and simulations, together 
with the barrier position for the MC case. Barrier position 
values at a give T are evaluated as the energy coordinate 
(x-axis in Fig. |^ corresponding to the barrier top at that 
temperature, normalized to the total contact energy in the 
native state (independent from T: E = —53.32 Kcal/mol with 
our choice of the parameters). Notice how the shifts in </>t 
correspond to those in the barrier position. 



the parameter e in order to reproduce the correct transi- 
tion temperature, is able to describe correctly the ther- 
modynamics of the folding process, at least in the case 
of simple two-state behavior. Indeed, the estimate of the 
folding barrier, both in the case of MF approximation as 
well as for MC simulations, lies within a relative error of 
about 15% from the experimental estimate in all the re- 
gion of experimental measures. This is indeed interesting, 
as the model lacks every detail about the nature of the 
residues, dealing with all atomic contacts in the ground- 
state on the same footing. Moreover, the estimate of the 
entropy is based on the theory of noninteracting poly- 
mers, and neglects possible clashes of the protruding un- 
folded loops with the folded part of the protein. 

Another important result concerns the (/)T-values: both 
MF and MC results recover the non-decreasing nature of 
experimental values, with MC providing a better esti- 
mate of the slope than MF. At difference with the exper- 
imental values, though, theoretical 0T-values increase in 
a discontinuous fashion, with abrupt changes followed by 
steady plateaus. This behavior is related to the fact the 
the transition state is quite broad, so that the actual 
free-energy maximum, determining the barrier, jumps 
through different values of the reaction coordinate (the 
number of native residues or the energy). This is an as- 
pect that deserve further analysis, also because the sim- 
ple three-state model, with a negligible intermediate, put 
forward by the author of Ref. (22j does not seem to be 
able to reproduce the experimental results with sufficient 
accuracy, and a satisfactory description of the transition 
state of this protein has still to be found. Probably, it 
will require the introduction of residue heterogeneities 
and more accurate studies on the dynamics of the sys- 
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